Abstract: Clustering is one of the unsupervised learning technique in which a set of basics is separated into uniform groups. It is more hard job as compared to supervised classification where classes are already known for training the system. This dilemma becomes most awful when sequential data are to be measured. Hidden Markov Models (HMM) comprise a tool for sequential data modeling. In this paper a scheme for HMM based sequential clustering is proposed and compared with K-Means using machine learning tool WEKA. In this approach proximity based methods are used, in which the main endeavor of the clustering process is in formulating similarity or distance measures between sequences. Proposed K-Means is a useful tool for identifying co-expressed genes, biologically relevant groupings of genes and samples. Experimental results demonstrate that HMM based K-Means outperforms K-Means in terms of accuracy. But Proposed K-Means has an intense computational load.

Keywords: Data mining, Clustering, K-Means Clustering, HMM, Distance measure.